摘要 :
Flash Friendly File System (F2FS) is getting popular among mobile devices. However, lack of empirical and comprehensive analysis for characteristics of F2FS prohibits better application of F2FS. In this paper, we present a set of ...
展开
Flash Friendly File System (F2FS) is getting popular among mobile devices. However, lack of empirical and comprehensive analysis for characteristics of F2FS prohibits better application of F2FS. In this paper, we present a set of comprehensive experimental studies on mobile devices and show several counterintuitive observations on F2FS, including imprecise hot/cold data separation, unexpected trigger condition of background GC, impact of fragmentation on read performance and impact of readahead by fragments and available space. Based on these observations, we further provide several pilot solutions to improve the performance of these mobile devices. The objective is to inspire researchers and users to pay attention to F2FS characteristics, and further optimize its performance.
收起
摘要 :
Modern discrete GPUs support Unified Virtual Memory (UVM), simplifying GPU programming. However, UVM entails address translation on each memory access, which introduces expensive performance overhead during address translation. In...
展开
Modern discrete GPUs support Unified Virtual Memory (UVM), simplifying GPU programming. However, UVM entails address translation on each memory access, which introduces expensive performance overhead during address translation. In this work, we select various workloads and conduct experiments on GPU performance. Our investigation shows that many workloads have low L1 TLB hit ratios of less than 40% on average. Even for a particular workload, the hit ratio is as low as 15%, which leads to significant performance degradation. Through further analysis, we find that a lot of common entries exist between neighboring private L1 TLBs, showing clear inter-TLB sharing behavior. To leverage the sharing, we propose a Neighboring Directory table based hardware scheme, named NeiDty. In NeiDty, L1 TLBs can probe physical addresses from neighboring L1 TLBs through a lightweight interconnect network. And NeiDty uses neighboring directory tables to keep track of the shared entries among neighboring L1-TLBs. In addition, we find it better to update address translation after two consecutive neighboring TLB hits than one hit. We run eight typical workloads with Gem5-GPU, and the results show that NeiDty increases the average hit ratio of L1 TLB TLB by 14% and improves the average performance by 10%.
收起
摘要 :
Similar to other digital assets, deep neural network (DNN) models could suffer from piracy threat initiated by insider and/or outsider adversaries due to their inherent commercial value. DNN watermarking is a promising technique t...
展开
Similar to other digital assets, deep neural network (DNN) models could suffer from piracy threat initiated by insider and/or outsider adversaries due to their inherent commercial value. DNN watermarking is a promising technique to mitigate this threat to intellectual property. This work focuses on black-box DNN watermarking, with which an owner can only verify his ownership by issuing special trigger queries to a remote suspicious model. However, informed attackers, who are aware of the watermark and somehow obtain the triggers, could forge fake triggers to claim their ownerships since the poor robustness of triggers and the lack of correlation between the model and the owner identity. This consideration calls for new watermarking methods that can achieve better trade-off for addressing the discrepancy. In this paper, we exploit frequency domain image watermarking to generate triggers and build ourDNN watermarking algorithm accordingly. Since watermarking in the frequency domain is high concealment and robust to signal processing operation, the proposed algorithm is superior to existing schemes in resisting fraudulent claim attack. Besides, ex-tensive experimental results on3datasets and8neural networks demonstrate that the proposed DNN watermarking algorithm achieves similar performance on functionality metrics and better performance on security metrics when compared with existing algorithms
收起
摘要 :
Similar to other digital assets, deep neural network (DNN) models could suffer from piracy threat initiated by insider and/or outsider adversaries due to their inherent commercial value. DNN watermarking is a promising technique t...
展开
Similar to other digital assets, deep neural network (DNN) models could suffer from piracy threat initiated by insider and/or outsider adversaries due to their inherent commercial value. DNN watermarking is a promising technique to mitigate this threat to intellectual property. This work focuses on black-box DNN watermarking, with which an owner can only verify his ownership by issuing special trigger queries to a remote suspicious model. However, informed attackers, who are aware of the watermark and somehow obtain the triggers, could forge fake triggers to claim their ownerships since the poor robustness of triggers and the lack of correlation between the model and the owner identity. This consideration calls for new watermarking methods that can achieve better trade-off for addressing the discrepancy. In this paper, we exploit frequency domain image watermarking to generate triggers and build ourDNN watermarking algorithm accordingly. Since watermarking in the frequency domain is high concealment and robust to signal processing operation, the proposed algorithm is superior to existing schemes in resisting fraudulent claim attack. Besides, ex-tensive experimental results on3datasets and8neural networks demonstrate that the proposed DNN watermarking algorithm achieves similar performance on functionality metrics and better performance on security metrics when compared with existing algorithms
收起
摘要 :
The storage capacity of NAND Flash has increased by scaling down to smaller cell size and using multi-level storage technology, but data reliability is degraded by severer retention errors. To ensure data reliability, error correc...
展开
The storage capacity of NAND Flash has increased by scaling down to smaller cell size and using multi-level storage technology, but data reliability is degraded by severer retention errors. To ensure data reliability, error correction codes (ECC) are adopted, such as BCH and low-density parity check (LDPC) codes. However, BCH codes are insufficient when raw bit error rates (RBER) caused by retention errors are high. As a result, BCH codes are inevitably replaced with LDPC codes with stronger error correction capability. Traditional LDPC codes are used to independently correct bit errors in the LSB and MSB pages. Unfortunately, decoding latency in such two pages is significantly unbalanced, MSB pages take much higher latency due to higher RBER, leading to suboptimal flash read performance. This paper proposes a cooperative error correction scheme, called CooECC, to reduce LDPC decoding latency of the MSB page in NAND Flash. By exploiting data error characteristics introduced by retention errors, CooECC integrates the decoding result of the LSB page into the initial information of LDPC decoding for the MSB page, making it more accurate. This in turn enables decoding to converge at a higher rate. Simulation results show that for LDPC schemes with information lengths of 2KB and 4KB, the decoding latency can be reduced by up to 87\% and 84\%, respectively, when RBER is as high as 8.0 × 10-3.
收起
摘要 :
The storage capacity of NAND Flash has increased by scaling down to smaller cell size and using multi-level storage technology, but data reliability is degraded by severer retention errors. To ensure data reliability, error correc...
展开
The storage capacity of NAND Flash has increased by scaling down to smaller cell size and using multi-level storage technology, but data reliability is degraded by severer retention errors. To ensure data reliability, error correction codes (ECC) are adopted, such as BCH and low-density parity check (LDPC) codes. However, BCH codes are insufficient when raw bit error rates (RBER) caused by retention errors are high. As a result, BCH codes are inevitably replaced with LDPC codes with stronger error correction capability. Traditional LDPC codes are used to independently correct bit errors in the LSB and MSB pages. Unfortunately, decoding latency in such two pages is significantly unbalanced, MSB pages take much higher latency due to higher RBER, leading to suboptimal flash read performance. This paper proposes a cooperative error correction scheme, called CooECC, to reduce LDPC decoding latency of the MSB page in NAND Flash. By exploiting data error characteristics introduced by retention errors, CooECC integrates the decoding result of the LSB page into the initial information of LDPC decoding for the MSB page, making it more accurate. This in turn enables decoding to converge at a higher rate. Simulation results show that for LDPC schemes with information lengths of 2KB and 4KB, the decoding latency can be reduced by up to 87\% and 84\%, respectively, when RBER is as high as 8.0 × 10-3.
收起
摘要 :
Stereo image pairs can improve performance of many tasks benefiting from the additional information obtained from a second viewpoint when compared with single images. Existing superpixel segmentation algorithms for stereo images m...
展开
Stereo image pairs can improve performance of many tasks benefiting from the additional information obtained from a second viewpoint when compared with single images. Existing superpixel segmentation algorithms for stereo images mostly adopt single images as input, and neglect the correspondence between the left and right views. In this work, we consider to exploit the depth information between stereo image pairs, and propose an end-to-end dual-attention fusion network for stereo images to generate parallax-consistency superpixels. We first utilize a deep convolution network to extract the deep features of stereo images. Then, to effectively utilize the additional information from the other view, features of the left and right views is integrated by a parallax attention and channel attention mechanism. Finally, the stereo superpixels are generated by a differentiable clustering algorithm, which is end-to-end trainable with deep learning networks. Comprehensive experimental results demonstrate that our method can outperform the state-of-the-art performance on the KITTI2015 and Cityscapes dataset.
收起
摘要 :
Stereo image pairs can improve performance of many tasks benefiting from the additional information obtained from a second viewpoint when compared with single images. Existing superpixel segmentation algorithms for stereo images m...
展开
Stereo image pairs can improve performance of many tasks benefiting from the additional information obtained from a second viewpoint when compared with single images. Existing superpixel segmentation algorithms for stereo images mostly adopt single images as input, and neglect the correspondence between the left and right views. In this work, we consider to exploit the depth information between stereo image pairs, and propose an end-to-end dual-attention fusion network for stereo images to generate parallax-consistency superpixels. We first utilize a deep convolution network to extract the deep features of stereo images. Then, to effectively utilize the additional information from the other view, features of the left and right views is integrated by a parallax attention and channel attention mechanism. Finally, the stereo superpixels are generated by a differentiable clustering algorithm, which is end-to-end trainable with deep learning networks. Comprehensive experimental results demonstrate that our method can outperform the state-of-the-art performance on the KITTI2015 and Cityscapes dataset.
收起
摘要 :
Existing studies have uncovered that there exist significant Raw Bit Error Rates (RBERs) variations among different layers of 3D flash memories due to manufacture process variation. These RBER variations would cause significantly ...
展开
Existing studies have uncovered that there exist significant Raw Bit Error Rates (RBERs) variations among different layers of 3D flash memories due to manufacture process variation. These RBER variations would cause significantly diversed read latencies when reading data with traditional Low-Density Parity-Check (LDPC) codes designed for planar flash memories, which induces sub-optimal read performance of flash-based Solid-State Drives (SSDs). To investigate the latency diversity, this paper first performs a preliminary experiment and observes that LDPC read levels proportional to latencies increase in diverse speeds along with data retention. Then, by exploiting the observation results, a Multi-Granularity LDPC (MG-LDPC) read method is proposed to adapt level increase speed for each layer. Five LDPC engines with varied increase granularity are designed to adapt RBER speed requirements. Finally, two implementations for MG-LDPC are applied to assign LDPC engines for each flash layer in a fixed way or dynamically according to prior read levels. Experimental results show that the proposed two implementations can reduce SSD read response time by 21% and 47% on average, respectively.CCS CONCEPTS• Hardware→Temperature optimization; Network on chip; 3D integrated circuits.
收起
摘要 :
Existing studies have uncovered that there exist significant Raw Bit Error Rates (RBERs) variations among different layers of 3D flash memories due to manufacture process variation. These RBER variations would cause significantly ...
展开
Existing studies have uncovered that there exist significant Raw Bit Error Rates (RBERs) variations among different layers of 3D flash memories due to manufacture process variation. These RBER variations would cause significantly diversed read latencies when reading data with traditional Low-Density Parity-Check (LDPC) codes designed for planar flash memories, which induces sub-optimal read performance of flash-based Solid-State Drives (SSDs). To investigate the latency diversity, this paper first performs a preliminary experiment and observes that LDPC read levels proportional to latencies increase in diverse speeds along with data retention. Then, by exploiting the observation results, a Multi-Granularity LDPC (MG-LDPC) read method is proposed to adapt level increase speed for each layer. Five LDPC engines with varied increase granularity are designed to adapt RBER speed requirements. Finally, two implementations for MG-LDPC are applied to assign LDPC engines for each flash layer in a fixed way or dynamically according to prior read levels. Experimental results show that the proposed two implementations can reduce SSD read response time by 21% and 47% on average, respectively.CCS CONCEPTS? Hardware→Temperature optimization; Network on chip; 3D integrated circuits.
收起